Running RLVR Pipeline on Ascend NPU
Last updated: 04/28/2026.
This guide provides a complete end-to-end walkthrough for running the RLVR (Reinforcement Learning with Verifiable Rewards) pipeline on Huawei Ascend NPU, covering environment setup, data preparation, model download, configuration, training launch, monitoring & evaluation, and checkpoint resumption.
Workflow Overview
Running an RLVR task on NPU from scratch involves the following steps:
1. Environment Setup → 2. Data Preparation → 3. Model Preparation → 4. Write Config → 5. Launch Training → 6. Monitor & Evaluate → 7. Resume from Checkpoint
Step 1: Environment Setup
1.1 Hardware & Driver Prerequisites
Ensure your hardware and host drivers are ready:
| Item | Requirement |
|---|---|
| Hardware | Atlas 900 A2 PODc (Ascend 910B1) or Atlas 900 A3 PODc (Ascend 910_9391) |
| Host OS | Ubuntu 22.04 |
| CANN | 8.5.1 |
| Ascend NPU Driver | Installed on host (npu-smi info shows devices) |
| Docker | >= 20.10 |
1.2 Get the Docker Image
Use the pre-built Ascend image that matches your hardware. Official ROLL NPU image tags are available at https://quay.io/repository/ascend/roll?tab=tags. For container launch details, see the Ascend NPU Docker Usage Guide.
# For A2 hardware
docker pull roll-registry.cn-hangzhou.cr.aliyuncs.com/roll/pytorch:cann851-910b-py311-torch280-vllm0130
docker tag roll-registry.cn-hangzhou.cr.aliyuncs.com/roll/pytorch:cann851-910b-py311-torch280-vllm0130 roll:ascend-a2
# For A3 hardware
docker pull roll-registry.cn-hangzhou.cr.aliyuncs.com/roll/pytorch:cann851-a3-py311-torch280-vllm0130
docker tag roll-registry.cn-hangzhou.cr.aliyuncs.com/roll/pytorch:cann851-a3-py311-torch280-vllm0130 roll:ascend-a3
The current repository includes docker/Dockerfile.A2 and docker/Dockerfile.A3 for building custom images. If you maintain a custom image, keep the dependency versions aligned with the pre-built image.